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Abstract 

The paper presents a new proof of O’Cinneide’s characterization theorem [7]. It is much simpler 
than the original one and constructive in the sense that we not only show the existence of a phase type 
representation, but present a procedure which creates a phase type representation. We prove that the 
procedure succeeds when the conditions of the characterization theorem hold. 
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1 Introduction 

The characterization theorem of O’Cinneide [7] proves that any finite order matrix exponential function 
which is strictly positive in (0, oo) and satisfies the dominant eigenvalue condition has a finite dimensional 
phase type (PH) representation. Based on this theorem Mocanu and Commault [2] proposed a procedure 
for computing the PH representation of such matrix exponential function. A quite different approach from 
Maier [3] proposes a similar procedure based on Soittola’s automata-theoretic algorithms [8]. All of these 
papers prove the characterization theorem, but use complex mathematical concepts, such as polytopes, or 
positive rational sequences. Additionally, both procedures in [2] and in [3] are implicit in the sense that an 
essential parameter (r in [2] and c in [3]) are found as a result of a numerical search. 

In this paper we present a constructive proof of the characterization theorem by proposing an explicit 
procedure for computing a phase type (PH) representation of a matrix exponential function and showing 
that the procedure always terminates successfully if the matrix exponential function satisfies the positivity 
and the eigenvalue conditions. 

Compared to the existing resuls, one of the main advantages of the presented constructive proof is that 
it is rather elementary, using basic function and matrix theory and stochastic interpretation of Markov 
processes. Another contribution of the paper is that it links the sparse monocyclic representation [2] to the 
characterization theorem [7]. 

2 Preliminaries 

Definition 1. Let X be a non-negative random variable with probability density function (pdf) 

fx{x) = ApiTA < a;) = —aAe^^l, a: > 0, 
ax 

where a is an initial row vector of size n with fx{x)dx = 1 (there is no probability mass at zero), A is 
a square matrix of size n x n and 1 is the column vector of ones of size n. In this case, we say that X is 
matrix-exponentially distributed with representation cc. A, or shortly, ME (a., A)-distributed. 


1 


In Definition 1 the elements of ot and A are real numbers without any specific restriction on their sign 
and the only restriction is that fx{x) is non-negative for a; > 0. We note that A and commute. 

For a given representation (a, A), the size n of o: (and A) is called the order of the representation. 

The representation of a given ME distribution is not unique. 

Theorem 1. [1] Let ME(a., A) of order n and ME('y, G) of order m be two ME distributions with pdf fx (x) 
and fyi^x), respectively. If A is n x n 

If there exists a matrix W of cardinality n x m such that 

7 = aW, AW = WG, 

then ME(a,A) = ME('y,G) (that is, fx{x) = frix)). 

Proof. 

fvix) = —‘^Ge^^lm = —= —a.Ae^^'Wlrn = —aAe^^ln = fx{x). 

□ 

Theorem 1 will be used as a representation transformation tool. The size of column vector 1 is explicitly 
indicated in the theorem as a subscript. 

Definition 2. A representation of an ME distribution has minimal order if the distribution has no repre¬ 
sentation of a smaller order. A representation of minimal order is referred to as a minimal representation. 

In a minimal representation, there are no “extra” or “redundant” eigenvalues in matrix A. More precisely 
a minimal representation has the following properties [10]: 

PI) All Jordan blocks of A have different eigenvalues. 

P2) All eigenvalues contribute to the distribution with maximal multiplicity. For example, a Jordan block 
of size Hi corresponding to eigenvalue —Xi results in the terms in fx{x), where 

7^ 0. 

P3) a. is not orthogonal to any of the right-eigenvectors of A. 

P4) 1 is not orthogonal to any of the left-eigenvectors of A. 

P5) The Jordan block structures of all minimal representations of an ME distribution are identical. 

These properties are explained further in Appendix B. Based on these properties, a minimal represen¬ 
tation can be obtained directly from fx- If / takes the form 

m Tii 

fix )= 

i=i 

where Xi are different and cx-^m 7 ^ 0, then we will consider the following representation (ct. A): 

0 ... 0 \ 

J 2 0 ... 0 

5 

0 Jm J 

1 ... 0 \ 

A, 1 ... 0 

0 A, y 

ng 

^==A1 = fx(x); 

this equation can be solved because the left-hand side contains all the terms x^~^e~^‘^ up to j < Ui for 

f = 1,..., m. 


A = 


where 


J, = 


/ Ji 
0 


V 0 

/ A, 
0 

V 0 


and Ji is of size ni. a can be calculated by solvi 

— OLC^ 
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Lemma 2. The representation (a, A) is minimal for fx- 

The proof is essentially due to properties P1-P5 and the fact that no Jordan block of size smaller than 
Ui can represent the term Appendix B elaborates more on this topic. 

If representation (a, A) is minimal then there are some straightforward necessary conditions for vector 
a. and matrix A to define a valid distribution: 

Cl) The eigenvalues of A have negative real part (to avoid divergence of fx{x) as a: ^ oo). 

C2) There is a real eigenvalue of A with maximal real part (to avoid oscillations to negative values as 
X —?> oo). 

C3) al = 1 (normalizing condition which ensures fx{x)dx = 1). 

C4) If for all i £ {0,1,... ,j — 1} the *th derivative of fx{x) is zero then the jth derivative of fx{x) is 
non-negative (to avoid decreasing behavior around x = 0). 

If any of these necessary conditions are violated then the tuple consisting of the vector a and matrix A does 
not define a valid ME distribution. Note that non-minimal representations might contain any additional 
eigenvalues, including for example positive ones. 

A subclass of ME distributions is the class of phase-type distributions (PH distributions). 

Definition 3. If X is an ME(ol,K) distributed random variable, where ol and A have the following prop¬ 
erties: 

• Qfi > 0, al = 1 

• An < 0, Aij > 0 for i ^ j, A1 < 0 

• A is non-singular, 

then we say that X is phase-type distributed with representation (a, A), or shortly, PH(ol,A) distributed. 

PH distributions can be interpreted as the time of absorption in a CTMC [6] and consequently the 
conditions of Definition 3 are sufficient for vector ct and matrix A to define a valid distribution. Vector ct 
or matrix A satisfying the conditions of Definition 3 are referred to as Markovian. 

The following properties are essential for the characterization of ME distributions. 

Definition 4. An ME(ol, A) distribution satisfies the dominant eigenvalue condition (DEC) if for some 
minimal representation ME('~f, GJ, G has a single eigenvalue with maximal real part. This eigenvalue is 
called the dominant eigenvalue. Its multiplicity may be higher than 1. 

Dehnition 4 excludes the case when a is the dominant real eigenvalue and there is a pair of complex 
eigenvalues with the same real part, for example a ± ib, where i is the imaginary unit. 

Properties PI-P5 ensure that if C1-C4 hold for one minimal representation ME(q;, A), they hold for all 
equivalent minimal representations. Additionally, if the dominant eigenvalue has multiplicity higher than I, 
then it belongs to a Jordan-block whose size is equal to the multiplicity of the dominant eigenvalue. 

Definition 5. The ME(a, A) distribution with density fx satisfies the positive density condition if fxix) > 
0 for all x G (0, oo). 

Theorem 3. [7] If fx is ME(ol, A) distributed, then fx has a finite dimensional PII(f3, H) representation 
iff the following two conditions hold: 

• ME(ol, A) satisfies the dominant eigenvalue condition; 

• fx satisfies the positive density condition. 

The original proof of O’Cinneide in [7] is rather complex, using Laplace-Stieltjes transform and geometric 
properties of the space of PH-distributions. In this paper we present an algorithm that gives a constructive 
and altogether more elementary proof, using function and matrix theory. 
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3 Procedure and proof 

Our main goal is an algorithm that provides a constructive proof for the sufficient direction of Theorem 3, 
that is, given that the dominant eigenvalue condition and the positive density condition hold for ME(a, A), 
find a PH-representation equivalent to ME(q:, A); in other words, find a vector-matrix pair (/3, B) where (3 
and B are Markovian and define the same distribution as ME(q;, A). 

This section is devoted to the algorithmic construction, also stating the theorems used along the way. 
Proofs are given in Appendix C. 

We also included a proof for the necessary direction of Theorem 3 in Appendix B. While the proof of 
the necessary direction is straightforward using the techniques in [7], we opted to include a self-contained, 
elementary proof that is more in line with the methods of the present paper. 


3.1 Sketch of the algorithm 

The algorithm consists of five main steps. Steps 1 and 2 are preparatory, and Step 5 is just correction related 
to Step 2. 

• Step 1. We find an equivalent minimal representation (q;i,Ai) for (a. A) if it is not minimal by 
eliminating any “extra” eigenvalues of A, which does not contribute to the pdf. We refer to Lemma 2 
and [ 1 ] for a straightforward and computationally stable method of finding a minimal representation. 

• Step 2. This step applies only if density is zero at 0, that is, /x(0) = 0. This step is essentially what 
may be called “deconvolution”: we represent fx as the convolution of some fy matrix exponential 
density function with /v'(O) > 0 and an appropriate Erlang-distribution Erlang(A:,/r) (see Lemma 4); 
if fy has a Markovian representation, then it gives a straightforward Markovian representation for 
fx as well (see Lemma 5). Thus we only need to find a Markovian representation for fy (and the 
corresponding representation, which is obtained from Lemma 4), where /v(0) > 0. If this step is 
applied. Steps 3 and 4 are applied for fy instead of fx, and we switch back to fx in Step 5. 

• Step 3. An equivalent representation ( 7 , G) is given with Markovian matrix G, while 7 may still have 
negative elements. The main tool of this step is the so-called monocyclic structure (with Feedback- 
Erlang blocks). Typically, the size of G is larger than that of A 2 (because each pair of complex 
conjugate eigenvalues is represented with at least 3 phases); that said, G is a sparse matrix with a 
simple block bi-diagonal structure. For this step only the dominant eigenvalue condition is necessary. 


• Step 4. 7 and G are transformed further into (3 and B where /3 is Markovian (and the Markovity of B is 
also preserved) essentially by adding an “Erlang-tail” (a number of sequentially connected exponential 
phases with identical rates) of proper order and rate to the monocyclic structure described by the 
Markovian matrix G. The main mathematical tool of this step is the approximation of elementary 
functions. Essentially, this last step is the contribution of the paper. The skeleton of this step is 
composed of the following elements: 


— Find r such that > 0 (element-wise). Such r always exists if the dominant eigenvalue and 

the positive density conditions hold and the pair ( 7 , G) results from the previous step. We 
remark that for a general representation, even if G is Markovian, such a r may not exist. This is 
further explained after Lemma 9. 

— Find A' such that 

7 -t- y ^ > 0 VA > A' 


which is always possible 

- Let e = infa,g(o,r)/x(a:). 
Find A" such that 


since ||7(I -b 7)’”'^ — 7e‘^'”|| —J- 0 as A 00. 

e > 0 because of the positive density condition and the result of Step 2 . 


— 7 e‘^'”Gl -b 7 



< e 


VA > A". 


This ensures that —7 (I-b^)^Gl > 0 for k = l,...,n where n = rA". This is always possible 
when e > 0 . 
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— Extend the ( 7 , G) representation with an Erlang tail of rate A = max(A', A") and order n = Ar. 

• Step 5. If Step 2 was applied, at this point we have a Markovian representation for fy- To switch back 
to fx, we use Lemma 5. If Step 2 was not applied, Step 5 does not apply either. 


3.2 Step 1: Minimal representation 

Starting from representation (a, A), we can obtain a minimal representation (cki, Ai) with the application 
of a representation minimization method. A minimal representation can be obtained through several ap¬ 
proaches. One possibility is directly from the pdf f{x) = as in Lemma 2. Another, computationally 

stable order reduction method is the Staircase method from [1], which uses singular value decomposition. In 
any case, the minimal representation (cci, Ai) enjoys properties PI-P5. 

There are two important properties that can be determined from a minimal representation (or the density 
function directly). These are the value and the multiplicity of the dominant eigenvalue and the validity of 
the dominant eigenvalue condition. We denote the dominant eigenvalue (which is real and negative) by —Ai 
and its multiplicity by ni. Indeed, Ai and ni determine the asymptotic rate of decay of the pdf: it decays 
like , where CAi.m is a positive constant, more precisely 


fix) 


lim , 

x—>-co ^6 


CAi,ni ■ 


3.3 Step 2: Positive density at zero 


In the case when ME(q:i, Ai) is such that fxix) > 0 for positive x values, but /x(0) = 0, then based on 
the following lemma, we represent ME(q:i, Ai) as the convolution of an Erlang distribution and a matrix 
exponential distribution ME(q: 2 , A 2 ) whose density is positive at 0. Actually, it turns out from the proof of 
the following lemma that Ai = A 2 . 

Lemma 4. If fx{x) = —is a matrix exponential pdf with 


fxix) >0 Vx > 0, 




= 0 1 = 0 ,...,/-!, 


x—0 


f^x\^) 


> 0 , 


X—0 


( 1 ) 


then fx can be written in the form 

fx = fy * gil, g-, ■), 

I I — 1 — g,x 

for some large enough /i, where g{l,g,x) = ^ \i-iy . — Erlang{l, p,) pdf, * denotes convolution and 

fy {x) is a matrix exponential function with 


fyix) >0 Vx > 0. 

The proof of Lemma 4 is given in the Appendix. The representation ( 0 : 2 , A 2 ) can be constructed either 
from fy via Lemma 2 or by using the fact that Ai = A 2 and calculating 0.2 from the appropriate linear 
equations. 

Lemma 4 and the following composition ensures that fxix) and fvix) have a Markovian representation 
and satisfy the dominant eigenvalue condition at the same time if /i > Ai. 

Lemma 5. If fvix) is ME distributed with representation ( 0 : 2 , A 2 ) of order m and g > Xi then 

fxix) = fyix) *g{l,g,x), 

and generator matrix 

\ 

5 

-g ga .2 
A2 

where the first I blocks of the matrix are of size one and the last block is of size m. Additionally, if ( 0 : 2 , A 2 ) 
is Markovian then {f3, B) is Markovian as well. 


is ME distributed with initial vector f3 = {1,0,0,..., 0} 

f -g g 

B = 


5 









-► 

(l-Zi)CJi 


b, 


Figure 1: FE-diagonal block. 




Figure 2: FE-diagonal representation of a generator with a real eigenvalue (cti) and a pair of complex ones. 


Proof. Based on the structure of B, the time to leave the first I phases is Erlang(l,/i) distributed and the 
time spent in the set of phases from / -|- 1 to to is ME (a, A) distributed. □ 

Based on Lemma 4 and 5 it remains to prove that the matrix exponential density function f{x) with 
/(O) > 0 satisfying the dominant eigenvalue and the positive density conditions has a Markovian represen¬ 
tation. 

3.4 Step 3: Markovian generator 

The aim of this subsection is to transform the potentially non-Markovian representation (a, A) of a ME 
distribution to a representation ( 7 , G) where G is a Markovian transient generator matrix satisfying the 
properties of the matrix of a PH distribution (Definition 3). Eor matrix G, we apply the matrix structure 
proposed in [5]. It is a block bi-diagonal matrix structure, where each block represents a real eigenvalue or 
a pair of complex conjugate eigenvalues of A. The blocks associated with real eigenvalue —Xi {—Xi < 0) are 
of size one, the diagonal element is —Xi and the first sub-diagonal element is Xi. The blocks associated with 
complex eigenvalues are composed by Eeedback-Erlang (EE) blocks. 

Definition 6. [5] A Feedback-Erlang (FE) block with parameters ( 6 , a, z) is a chain ofb states with transition 
rate a and one transition from the bth state to the first state, with rate za (c.f. Figure 1). The probability 
z G [0,1) is called the feedback probability. 

A FE block {b, a, z) with length 6 = 1 and z = 0 corresponds to a real eigenvalue —a and is referred to 
as degenerate EE blocks. Matrix G contains as many EE blocks (degenerate or non-degenerate) associated 
with a real eigenvalue or a pair of complex conjugate eigenvalues as the multiplicity of the eigenvalue. A 
non-degenerate FE block where 6 is odd has a real eigenvalue and (6 — l)/2 complex conjugate eigenvalue 
pairs. A non-degenerate FE block where 6 is even has 2 real eigenvalues and (6 — 2)/2 complex conjugate 
eigenvalue pairs. In both cases the eigenvalues are equidistantly located on a circle in the complex plane 
around —a. The dominant eigenvalue of the FE block (the one with the largest real part) with parameters 

( 6 , ( 7 , z) is always real and equals to r = —a ^1 — 2 ^^ [5]. Denote the eigenvalues of matrix A by — A^ ; the 

dominant eigenvalue (which is real) is — Ai. The FE blocks representing the eigenvalues are composed as 
follows 
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• if Xj is real, the corresponding FE block is a degenerate block; thus the parameters are: 

aj = Xj, bj = 1 , Zj = 0 , 

• if Aj = Qj ± icj {aj > Ai > 0, Cj > 0) is a complex conjugate pair, the parameters are: 


h,= 


2-k 


TT — 2 arctan 


— Ai + CL'^ 


(Jj = - [ — 2 aj — Cj tan j- Cj cot — 
= ( 1 - ( -at - Cj tan ^ ) / { 2 aj) 


where [x] denotes the smallest integer greater than or equal to x. 

This construction of the FE blocks ensures that Ai remains the dominant eigenvalue of matrix G, that 
is, the dominant eigenvalue of any FE block (r^) is less than — Ai except the one(s) associated with — Ai. 

Connecting the obtained FE blocks such that the exit transition of an FE block (whose rate is Aj(l — Zj), 
see Figure 1 , in case of non-degenerate FE block and Xj in case of a degenerate one) is connected to the first 
state of the next FE block composes a block bi-diagonal matrix (c.f. Figure 2 ). The obtained matrix G is 
Markovian and its Jordan form contains all Jordan blocks of matrix A. We order the FE blocks such that 
the first ni FE blocks are the ni degenerate FE blocks associated with — Ai. The order of the rest of the FE 
blocks are irrelevant. The FE blocks based finite Markovian representation of the eigenvalues of A is always 
feasible when the dominant eigenvalue condition holds. If there was a pair of complex conjugate eigenvalues 
aj ± icj which violates the dominant eigenvalue condition such that aj = Ai then the denominator of bj 
would be zero. 

Figure 2 depicts an example of a Markovian generator which is the monocyclic representation of a 
generator with a dominant real eigenvalue (—Ai = —cti) and a pair of complex conjugate eigenvalues in 
FE-diagonal form. In this representation there are two FE blocks, one of length bi = 1 with rate ai, and 
one of length 62 = 3 with rate a2 and feedback probability Z2- The associated generator matrix is 


G = 


/ 

-CTl 

Ui 

0 

0 \ 


0 

-(72 

(72 

0 


0 

0 

-(72 

(72 

V 

0 

za2 

0 

-(72 / 


In order to hnd an equivalent representation of ME(q:, A) with matrix G we need to compute vector 7, 
for which ME(a, A) = ME(7, G), with the help of Theorem 1 . Let n and m {n <m) be the order of A and 
G, respectively. Compute matrix W of size n x m as the unique solution of 

AW = WG, Wl = l, 


and based on W vector 7 is 


7 = aW. 

Since G is Markovian, the obtained (7, G) representation is Markovian if 7 is non-negative, but this is not 
necessarily the case. The case when 7 has negative elements is considered in the following subsection. 

3.5 Step 4: Markovian vector 

At this point in the algorithm, the ME distribution is described by representation (7, G) of order u which 
has a block bi-diagonal, Markovian matrix G, and a vector 7 with at least one negative element. In the next 
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step we extend the ( 7 , G) representation with an additional n phases in the following way. 


/ G -G1 \ 

-A A 


V -A / 


( 2 ) 


where B is of order u + n (the size of the upper left block of B is u, the remaining n blocks are of size one). 
— G1 is a non-negative column vector of size u. Due to the structural properties of G it contains exactly 
one non-zero element, which is the last element and it contains the exit rate from the last FE block. The 
transformation matrix W of size u x (u -I- n), which transform from representation ( 7 , G) to representation 
(/3, B) is the unique solution of GW = WB, = 1„. Fortunately, due to the special structure of 

matrix B, W is rather regular. 

Lemma 6 . W has the following form: 

w=((i+fri(i+?r‘^i(i+f)""^i-1^)’ 

where the size of the first block is u x u, the size of the remaining blocks is 1 x u. 

Proof. Substituting this expression of W into GW = WB and Wl„+„ = results in identities. □ 

Our goal is to find n and A such that (3 = 7 W is Markovian (that is non-negative), where 

rw = ( 71 + ?)” D (I+ ¥)”" h (I + ?)”■" 7^ b -1 7 ) ■ (3) 

The first block of this vector is of size u and the remaining n blocks are of size 1. We need to prove that 
this vector is nonnegative for an appropriate pair (A,n). 

Theorem 7. There exists a pair (A, n) such that 7 W is strictly positive. 

The rest of this subsection is devoted to proving Theorem 7. We assume everything that was done so far, 
for example that the dominant eigenvalue condition and the positive density condition hold, the density is 
positive at zero and also that the matrix G is Markovian and in monocyclic form such that the degenerate 
FE block(s) representing the dominant eigenvalue — Ai are the first one(s). Eirst we present a heuristic 
argument, then the formal proof. 


3.5.1 Heuristic argument 

A and n are typically chosen to be large (see [5]). However, finding an appropriate pair is not as simple as 
choosing some large A and a large n. Eor each n, the set of appropriate values of A forms a finite interval. 
If n is large enough, this interval is nonempty, but - without further considerations - it is impossible to 
identify this interval (or even one element of it). Vice versa, for each A there is a finite set of appropriate 
values for n. This means that the naive algorithm of increasing the values of n and A - without further 
considerations - may possibly never yield an appropriate pair. For this reason, we instead propose a different 
parametrization, which takes the dependence between n and A into account better. 

Let T = n/A. r turns out to be a value interesting in its own right. The ME pdf resulting from the pair 
( 7 W,B) has a term coming from the first block of B and it has n terms coming from the Erlang-tail. We 
argue that the terms coming from the Erlang-tail can be regarded as an approximation of the original pdf 
on the interval [ 0 ,r], while the term coming from the first block is some sort of correction that makes the 
approximation exactly equal to the original pdf. Each of the terms in the Erlang-tail contribute an Erlang 
pdf with rate A and order k £ [1,..., n] to the pdf. The Erlang(A, k) pdf is concentrated around the point 
j = ^- These points are situated along the interval [0,r] in an equidistant way with distance j. 

The weight (initial probability) of the Erlang pdf centered around the point -r is 
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Figure 3: Erlang pdf’s approximating the original one 


which means that the weights are approximately equal to samples of the original pdf at points —, fc G 
[1,..., n] divided by A, resulting in a pdf that is approximately equal to the original along the interval [0, r]. 

The first block of 7 W is different. From the form of B it is clear that the contribution of the first block 
is concentrated after the point t; the role of this block is essentially to make a correction in the interval 
[r, 00] , where the previous Erlang-approximation does not hold. 

Altogether the previous argument can be depicted nicely in Figures 3 and 4. We denote 

/ G \ ^ —G 1 

/fc(a;) = 7 (1+y j 9 {k,X,x), k = 0,...,n-l 

the approximating Erlang terms and 

G \ ” 

+ y j e"‘^“(-Gl) =1=5(71, A, x) 

the correction term. In Figure 3, the approximating Erlang terms roughly follow the graph of fx, while /o 
is concentrated after r. (The values are r = 3, A = 12 and n = 36; to make the figure visually apprehensible, 
only some of the approximating Erlang functions were included with slightly increased weights.) 

The value of A controls how concentrated the approximating Erlang pdf’s are and also controls how close 
their weights are to the sampling of the original pdf. Given that fx{x) > 0 for x > 0, this means that for 
any choice of t, the Erlang-approximation has positive weights if A is large enough. The choice of r is only 
important to make sure that the weights assigned to the correction term are also positive. Figure 4 shows 
an example where A is too small (notably A = 4). In this case, some of the approximating Erlang functions 
have negative coefficients. 

3.5.2 Formal proof 

Before the actual proof, some results are stated as standalone lemmas. Their proofs are in Appendix C. 

The first one is essentially a real approximation, so we state it in that form too, along with the matrix 
version which is useful for our purposes. Relevant properties of matrix (and vector) norms can be found in 
Appendix A. 


fo{x) =71 
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Figure 4: If A is too small, some Erlang pdf’s are negative 


Lemma 8. i) For any fixed r > 0 and positive integer n, 


sup 

\z\<r 

and the supremum is obtained at z = r. 
a) For any H square matrix, 


e — 


0 - 3 ’ 


< 


2 n ’ 


e^ - (I 


H 


< 


2n 


where r = ||H||. 

We state one more lemma. It identifies the main terms in e^^ when G is in monocyclic form. 

Lemma 9. 


^ z/m <j<u, 

( Gx\^ =0 2<i<u, l<j<u, 

X^CX) 

where Cj denote positive (combinatorial) constants and f(x) ^ g{x) denotes that lim,c^oo fix)/g{x) = 1. 
The last relation means that the first row dominates all other rows as t tends to infinity. 

, „ , 

Note that the last part of Lemma 9 is stated as —>■ 0; in fact, the elements (e are in a form 

similar to just with either the same exponential term and lower degree polynomial terms, or lower 

exponent (and in this case, the polynomial term does not matter). The actual exponents and polynomial 
terms, along with the constants Cj can be calculated explicitly from the proof of Lemma 9, but will not be 
used. 
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We emphasize that Lemma 9 relies heavily on the monocyclic structure of G, notably on the fact that 
the upper bi-diagonal elements (elements ( 1 , 2 ), ( 2 ,3),...) of the matrix are strictly positive. 

Now we are ready to prove Theorem 7. 


Proof of Theorem 7. 

We assume that the matrix exponential density function fx associated with representation ( 7 , G) satisfies 
/jc(0) > 0, the dominant eigenvalue and the positive density conditions, and that G is in monocyclic block 
structure with the first block corresponding to the dominant eigenvalue Ai. 

First we show that the first coordinate of -y, denoted by , is positive. 

If 7 ]^ = 0, then the multiplicity of —Ai is ni — 1 according to the structure of matrix G (see (7) in the 
proof of Lemma 9 in subsection C.3), which is in conflict with the fact that the multiplicity of — Ai in the 
minimal representation is ni. 

fx{x) is dominated by the first row of e*^^ for large values of x and consequently the sign of fx{x) is 
determined by 7 ^. The elements of are transient probabilities of the Markov chain with generator G, 
consequently they are non-negative. The elements of the first row of are strictly positive for a: > 0 
because the FE-blocks are connected that way that all states are reachable from the first state (cf. Figure 
2). According to Lemma 9 fx{x) is dominated by the first row of e^^ for large values of t and consequently 
the sign of fx{x) is determined by 7 ^. More precisely, Lemma 9 implies that 

0 < fx{x) = 7(-G)e®“l ~ C'Ai7ia;”i-ie-^i=" 


where C = J2j>ni Q > 0 and Ai > 0. 

Next we show that there exists a t such that 'je^'^ is positive. 

For the first row of 76 *^“ we have 

ifj<ni, 

( 76 °")^^. ^ if m < j < u, 

from Lemma 9. Thus 76 *^^ is positive if x is large enough. For a constructive procedure to find r, one can 
double x starting from ni/Ai as long as min( 7 e‘^^) < 0. It is not necessary to find the smallest x for which 
76 *^^ is nonnegative. 

After that we show that there exists A' such that 7(1 + > 0 for A > A'. 

Apply Lemma 8 with H = Gr and n = At to get that 




0 


as A —>■ cx), and consequently 




76 


Gr 


0 , 


meaning that 7(1 -I- x)”'’" strictly positive if A is large enough. Let ei = min( 7 e‘^”^); in accordance 

with Lemma 8 , define A' as the solution of 


{gxY" 


2At 


= ei- 


(4) 


where g = ||G||. Then 7 (I-f- x)”^^ > 0 for A > A', because the left-hand side is a strictly monotone decreasing 
function of A. Note that A' is explicitly computable from (4). 

Next we investigate the sign of the rest of the elements of vector 7 W. We apply Lemma 8 again, this 
time for H = ^ and n = k to get 


fcG 

e ^ 




< 


^^{kgf 


2feA2 


< e 


rgf9_ 

2A 
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uniformly in 0 < A: < Ar. 

Let €-2 = info<a;<T fx{x) = info<a;<T 7 e‘^^(—G)l. Since fx{0) > 0 as a result of Step 3 in Section 3.3, 62 
is strictly positive, due to the positive density condition. Let 14 be the A:-th coordinate of 7 W associated 
with the Erlang tail in (3); that is, 


Vfc = 7 



-G1 

A 


Then 




’ / GA'"' 



kG 

/ ga'" 


7 

r-r-)J 

G1 

< II7II 

- 

(■"-) 


IGIIIUII < 




Define A" as the solution of 

Il7l|e"^^5||l||=e2. (5) 

A" is also explicitly computable. (Note that || 1 || = 1, see Appendix A). For all A > A" we have 14 > 0 
because fx{j) > £2 and the difference between AI 4 and fx{j) is less than 62 - 

Putting these together, we get that for r and A = max(A',A") both parts of the vector 7W, that is, 
7 (I + and 7 (I + ^) for k = 0,1,... ,n — 1, are positive where n = [tA] and the obtained 

representation is indeed Markovian. 


3.6 Step 5: correction related to Step 2 

If Step 2 was applied, (/3, B) is actually a Markovian representation for fy] Lemma 5 ensures that 


/3' = {1,0,0,...,0} 

/-MM \ 


B' = 


V 


-M 


M/3 

B 


is a Markovian representation for fx(x) = fy^x) * 


4 Worked example 


Let 


= 1^(1 

/ -1 
0 


A = 


then 


/(x) = —aAe 


Ax 


0 
0 
0 
0 

V 0 


, 102, 

1 =-(xe 

139 


1 

-1 

0 

0 

0 

0 

0 


0 

0 

-1 

1 

0 

0 

0 


_ 2 
3 

0 

0 

4 

-1 

0 

0 

0 


_5 

2 

0 

0 

0 

0 

-4 

0 

0 


12 

17 

0 

0 

0 

0 

0 

-5 


14 
17 

0 \ 
0 
0 
0 
0 
3 


-3 -5 / 


^—3x 


- lOe 


—4x 


e (8cos(3x) + 4sin(3x))) 


The eigenvalues of A are —1 (with multiplicity 2), —3,—4,—5 + 3i, —5 — 3i and 1. The eigenvalue 1 is 
redundant: the corresponding right-eigenvector is orthogonal to a, thus it does not appear in the pdf. It is 
eliminated during Step 1. 
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After Step 1, a minimal representation is obtained: 


oti = 


Ai = 


102 

/ -1 
0 
0 
0 
0 

V 0 


1 1 
1 

-1 
0 
0 
0 
0 


1 _5 13+i 13-i \ 

3 2 17 17 / ’ 

0 0 0 0 \ 

0 0 0 0 

-3 0 0 0 

0-4 0 0 

0 0 -5 + 3i 0 

0 0 0 -5 - 3i / 


Since /(O) = 0, Step 2 needs to be applied. 


/(O) = 0 /'(O) = 7 > 0, 


so the value of k in Lemma 4 is fc = 1. Setting /i = 10, the transformed pdf after Step 2 (borrowing the 
notation fy from Lemma 4) is 


frix) = 


102 

1^ V 10 


^—3x 


10 


— 6e 


—4x 


13 


•g(-5+3i)a: _|_ 


13-i 


,(—5—3n)ai 


and the corresponding representation for fy is 

rv f A 1 2 _7_ _2^ 7 1 

^ 139 10 5 15 340 340 / ’ 

/ -1 1 0 0 0 0 \ 

0 -1 0 0 0 0 

_00 -3 0 0 0 

"^^“ 0 0 0 -4 0 0 

0 0 0 0 -5 + 3i 0 

\ 0 0 0 0 0 -5 - 3i/ 


From now on, we work with this representation. In Step 3, the eigenvalue pair 5 ± 3i is represented by a 
feedback-Erlang block. The order of this pair is 6 = 4, and the corresponding FE-block is 


/ -5 5 0 0 \ 

0-550 
0 0-55 


V ^ 0 0 -5 y 


Step 3 results in the representation 


7 

102 , 

315 

10733 

6641 


8399 

147 

67 

139 ^ 

2176 

21760 

32640 

21760 

680 

272 



/ -1 

1 

0 

0 

0 

0 

0 

0 \ 



0 

-1 

1 

0 

0 

0 

0 

0 



0 

0 

-3 

3 

0 

0 

0 

0 



0 

0 

0 

-4 

4 

0 

0 

0 

G 

— 

0 

0 

0 

0 

-5 

5 

0 

0 



0 

0 

0 

0 

0 

-5 

5 

0 



0 

0 

0 

0 

0 

0 

-5 

5 



1 0 

0 

0 

0 

81 

125 

0 

0 

-5 / 


45 225 1 

1088 1088 / ’ 


Since 7 still contains negative elements. Step 4 needs to be applied. 

Following the algorithm in the proof of Theorem 7, we obtain the following values: 

• T = 0.5 (from 76*^’’ > 0), 

• 5=||G|| = 10, 
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• Il7ll <1-5, 

• Cl > 0.05 (for T = 0.5), 

• A' = 112000 from (4), 

• e > 0.069, and thus A" = 806600 from (5). 

403300 we obtain a Markovian representation 

1, we obtain a Markovian representation for 
Note that the order of this representation is 


This means that applying Step 4 with A = 806600 and n = tA = 
for fy in the form of (2). 

Finally, Step 5 applies, so by Lemma 5 with /r = 10 and k = 
the original ME(q:, A). The representation is of order 403309. 
very far from minimal, but we do not pursue a minimal value. 


5 Conclusion 

We have proposed a constructive proof for O’Cinneide’s characterization theorem [7] along with an algorithm 
that always succeeds in finding a Markovian representation. The algorithm and the proof are divided into 
a few distinct steps, connecting some of the modern results in the field as well as introducing some original 
ideas using elementary function theory and matrix analysis. 
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A Vector and matrix norms 


We need some auxiliary facts about vector and matrix norms. First we define vector norms. Let u be a 
vector of size n. 

Definition 7. The 1-norm and oo-norm of v are 

n 

ll^lli = V kil, Halloo = max |ui|. 

l<i<n 

2=1 - 

Lemma 10. a) ||.||i and ||.||oo equivalent, i.e, 

ll^’lloo < ll^’lli < n ||u||oo. 

b) If V is a row vector and w a column vector, then 

|um| < llvlli ||m||oo, It-ml < ||t;||oo ||m||i. 

The fact that they are equivalent means that they define the same topology, so convergence to 0 is 
equivalent in either norm. For convenience, we will stick to using ||.||i for row vectors and ||.||oo for column 
vectors. 

We also need a matrix norm. 

Definition 8. The oo-norm of A is 

n 

||A|loo = m^ y ' |Aj| 

l<z<n ^^ 

“ “ i=i 

This is a submultiplicative norm: 

IIABIIo, < IIAIIo, ||B||o,. 

Actually, the above matrix norm is the induced matrix norm of the vector norm ||.||oo when multiplying 
a column vector with a matrix from the left, and the induced matrix norm of the vector norm ||.||i when 
multiplying a row vector with a matrix from the right. This means it works nicely with the previous vector 
norms. 

Lemma 11. Let v be a row vector and w a column vector of size n and A be an n x n matrix. Then 
ll'wAlli < ||u||i ||A||oo, ||Am||oo < ||A||oo NHoo, \vAw\ < ||u||i ||A||oo ||'w||oo. 

B Proofs for the necessary direction 

Definition 9. The Markovian (cx, A) representation of PH (a, A) is redundant if it contains at least one 
state which cannot be visited by the Markov chain with initial distribution a and generator A. Otherwise 
(a, A) is non-redundant. 

If the representation (a, A) is redundant then it is easy to identify and eliminate the redundant states. 
Consider the vector —q:A~^. The stochastic interpretation of its fth coordinate is the mean time spent in 
state i before absorption. If the ith element of vector — aA^^ is zero then state i is redundant and the 
associated elements can be deleted from vector ol and matrix A without modifying the distribution of time 
till absorption. 

Lemma 12. If X is PH(a,A) distributed and non-redundant, then the positive density condition holds, that 
is, 

fx{x) >0 Vx > 0. 
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Proof. If X is PH(q;,A) distributed and non-redundant then there is a path from every state with positive 
initial probability to the absorbing state and every state belongs to one of those paths. Consequently, the 
Markov chain is in state j at time x with positive probability, for any time cc > 0 and for any state j. Let 
state i be a transient state from where the absorption rate gi is positive. Then 

n 

fx{x) = ae^^{-A)l = ^ Pv{Z{x) = j)gj > Pr{Z{x) = i)gi > 0, 
i=i 

where Z{x) denotes the underlying Markov chain. □ 

Lemma 13. If X is PH(a,A) distributed and non-redundant, then the dominant eigenvalue condition holds. 

Before proving Lemma 13, we elaborate on Definition 2. Let ME{^, G) be a minimal representation for 
X. Consider its pdf using the Jordan-decomposition of G (G = PJP~^) 

i 

fx(x) = -TPJe-J'P-il = ^ - 7 P,J,e-J^"P'l, 

i=l 

where Ji denotes the Jordan-block corresponding to the eigenvalue —Xi and P^ denotes the submatrix of P 
containing only the columns corresponding to J^. P' denotes the submatrix of P“^ that contains only the 
rows corresponding to (thus P^ is of size n x rij, where is the multiplicity of —Xi and n is the size of G, 
and P' is of size x n). In P^, the first column of each block is the (unique, up to a constant factor) right 
eigenvector Vi corresponding to that eigenvalue and the other columns are generalized eigenvectors. Similarly 
in P', the last row of each block is the (unique, up to a constant factor) left eigenvector Ui corresponding to 
that eigenvalue and the rest of the rows are generalized eigenvectors. If i ^ j, then P'Pj = 0. 

The dominant term of e'^’“ is equal to (where Ui denotes the size of Ji), and it is situated in 

the upper right corner. Within —7PiJie'^*“P'1 this dominant term is obtained exactly when taking 

= {■^Vi)Xi— --^(itil). 

[m - 1)! 

If any of the coefficients (7Ui) and (itil) is 0, this term would vanish. Properties P3 and P4 ensure that 
this is not the case, in other words, all eigenvalues contribute to the pdf with maximal multiplicity (that is. 
Property P2). 

This allows us to prove the DEC for any (possibly non-minimal) Markovian representation (a. A) by 
proving that there exists a real eigenvalue of A that is strictly greater than the real part of all other 
eigenvalues AND this eigenvalue contributes to the pdf with maximal multiplicity. 

The proof of Lemma 13 is based essentially on the Perron-Frobenius lemma. We begin by citing the 
Perron-Frobenius lemma along with a necessary definition, see for example [4]. 

Definition 10. An n x n matrix M is reducible if there exists a nontrivial partition I U J of {1, 2,..., n} 
such that 

My=0 \/iGl,jeJ. 

Otherwise, M is irreducible. 

In case M is the transient generator of a PH distribution, then irreducibility means that each state can 
be reached from any other state before absorption, in this case we say that M has a single communicating 
class. If the Markov chain dehned by M has multiple communicating classes, they correspond to a partition 
of the states as in the above definition. 

Theorem 14 (Perron-Frobenius). If the irreducible matrix XA has nonnegative elements, then there exists 
a positive eigenvalue i>i of M such that 

• i>i has multiplicity 1, 

• vi > IzzilVi where vt denote the eigenvalues o/M, and 

• the corresponding right-eigenvector Vi is strictly positive (note thatvi is unique up to a constant factor; 
it can be chosen such that Vi is strictly positive). 

See Theorem 3 in [9] for a short, self-contained proof or Chapter 8 in [4] for a more detailed discussion. 
Note that the same conclusion holds for the left-eigenvector Ui as well. Note that the fact that is positive 
with multiplicity 1 and > \vi\ mean that ^{vf) < vi for i ^ 1. 
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Proof of Lemma 13. 


In case A has a single communicating class we apply Theorem 14 to the matrix M = A + wl, where 
uj = maxi \aii\. Given that the matrix A is Markovian, M is nonnegative with the same eigenvectors and 
the eigenvalues shifted by ui. The dominant eigenvalue vi of M corresponds to the dominant eigenvalue —Ai 
of A, that is = —Ai + w and the same relation holds for the other eigenvectors. Clearly for i 7^ 1 

< vi 5R(— Ai) < — Ai- 


If A has a single communicating class then Theorem 14 guarantees that the multiplicity of —Ai is 1; this 
means that the unique dominant term in the pdf is (Q;t;i)Aie“^^“(Mil). Strict positivity of Vi and tti ensure 
avi > 0 and ttil >0, so indeed Ai contributes to the pdf with multiplicity 1. 

If A has several communicating classes, the states can be renumbered such that A is an upper block 
triangular matrix, where each diagonal block corresponds to a communicating class and the blocks above 
the diagonal correspond to transitions between classes. The diagonal blocks are denoted by Bi,..., Bfc. The 
eigenvalues of A are the union of the eigenvalues associated with these diagonal blocks. Each B^ is itself the 
generator of a transient Markov chain, and, since B^ is also irreducible. Theorem 14 can be applied to each 
of them. It follows that each of these blocks (communicating classes) has its own dominant eigenvalue such 
that within that class, the real parts of all other eigenvalues are strictly smaller. It follows directly that the 
largest eigenvalue of A (denoted by —Ai) is real and has —Ai > 5R(—A^) for all Ai 7^ Ai. 

However, as opposed to the single class case, the multiplicity of — Ai may be higher than I. Also, there 
may be several eigenvectors corresponding to — Ai. This means that in order to calculate the contribution 
of —Ai to the pdf, we need to be slightly more meticulous. The proof is essentially a transformation of 
the matrix A to a form that is similar to the Jordan form (but not the same), while preserving some 
nonnegativity of A and a (where it is important). We also present a numerical example at the end of this 
section to demonstrate the steps of the proof. 

Let = Bi be the Jordan decomposition of B^. We assume that the first block of 3i is the 

single dominant eigenvalue of B^; Theorem 14 thus guarantees that the first column of Qi, which is the 
corresponding right eigenvector, is strictly positive, and the first row of which is the corresponding left 
eigenvector, is also strictly positive. Create the transformation matrix 


Qi 0 0 ... 0 

0 Q 2 0 ... 0 


0 ... 0 Qfc 


Then Q ^AQ is an upper triangular matrix that contains the eigenvalues of A in its diagonal. Applying 
this transformation to the pdf, we get 


fx{x) = -aAe^^l = -(aQ)(Q-^AQ)e(‘^ 

Take all rows and columns of Q~^AQ that have —Ai in the diagonal. Denote this submatrix by B. The 
submatrix B is responsible for the whole contribution of — Ai. B can be calculated as 

B = RQ^^AQR^ 

where R is a ni x n binary matrix (whose elements are either 0 or I) where ni is the multiplicity of the 
dominant eigenvalue in A and n is the size of A; row i in R is equal to the unit vector ej if the f-th 
instance of —Ai in the diagonal of Q“^AQ is at coordinate j,j. (ccQ) is strictly positive on the coordinates 
corresponding to B since the dominant eigenvector of Qi are strictly positive and the block of a associated 
with Qi is nonnegative and different from 0 (if it was 0 then PH(q:,A) would be redundant). Similarly, 
(Q'^l) is strictly positive on the coordinates corresponding to B. 

Finally, we argue that we can identify the dominant term in and see that it has a positive coefficient. 
This is done directly instead of transforming B to Jordan form. To this end, note that the offdiagonal 
elements of B are nonnegative since A originally contained nonnegative elements above the diagonal, which 
were then multiplied by the strictly positive dominant left and right eigenvectors of each block B^. 
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The matrix Ail + B is strictly upper triangular, thus nilpotent; this implies that the series expansion 

(Ail+B)x _ ((All + B)x)^ 
k\ 

k=0 

is actually a finite sum, and is a polynomial of x. The dominant term in is equal to the last 

nonzero term of this polynomial, multiplied by The coefficient of this term is necessarily positive 

since (Ail + B) and thus powers of (Ail + B) do not have negative elements. 

Consequently, we have proved that Ai contributes to the pdf 

fx{x) = -aAe^^l = -(aA)(Q^^AQ)e(‘^''^‘^)“(Q'H). 


with maximal multiplicity and with a positive coefficient, and the DEC holds. 


Example 1. Let 


-4 

1 

1 

0 

0.2 

0.4 

0 

0 

0 

0.4 

1 

-2 

1 

0 

0 

0 

0 

0 

0 

0 

2 

0 

-3 

0 

0 

0 

0 

0.2 

0.4 

0.2 

0 

0 

0 

-4 

3 

0.2 

0.2 

0 

0.4 

0 

0 

0 

0 

1 

-2 

0 

0.2 

0.2 

0 

0.2 

0 

0 

0 

0 

0 

-2 

1 

0 

1/5 

0 

0 

0 

0 

0 

0 

1 

-2 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

-8 

2 

0.6 

0 

0 

0 

0 

0 

0 

0 

6 

-7 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

-1 


A has 5 communicating classes: Bi has size 3 and dominant eigenvalue —1, B 2 , B 3 and B 4 are of size 2 
and their dominant eigenvalues are -1,-1 and —4 respectively; B 5 is of size 1 with dominant eigenvalue 
— 1. Thus Ai = 1. 
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-1 
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0 

0 

1 

-3 
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0 

0 
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0 

0 

0 

1 

1 
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0 
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0 
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0 
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0 
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0 

0 

0 
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0 
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1 
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0 

0 

0 

0 

0 

0 

0 

0 

3 

2 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 


Notice that in Q, the first column in each block is strictly positive. Even though it is not displayed in this 
example, Q (and Q~^AQJ may contain complex numbers, but only in rows and columns corresponding to 
non-dominant eigenvalues. 


Q'^AQ 


-1 

0 

0 

0.05 

0.05 

0.10 

- 0.10 

0.25 

0.20 

0.15 

0 

-3 

0 

0.10 

0.10 

0.20 

- 0.20 

0.50 

0.40 

0.30 

0 

0 

-5 

-0.15 

0.30 

-0.15 

-0.30 

0.25 

0.20 

-0.25 

0 

0 

0 

-1 

0 

0.25 

0.15 

0.35 

0 

0.15 

0 

0 

0 

0 

-5 

-0.05 

0.05 

-0.15 

-0.40 

0.05 

0 

0 

0 

0 

0 

-1 

0 

0.20 

0.30 

0 

0 

0 

0 

0 

0 

0 

-3 

- 0.20 

-0.30 

0 

0 

0 

0 

0 

0 

0 

0 

-4 

0 

9/35 

0 

0 

0 

0 

0 

0 

0 

0 

-11 

-6/35 

0 

0 

0 

0 

0 

0 

0 

0 

0 

-1 


The rows and columns that include the dominant eigenvalue are marked and so 


■ 1 

0 

0 

0 

0 

0 

0 

0 

0 

0 ■ 


■ -1 

0.05 

0.10 

0.15 ■ 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 


0 

-1 

0.25 

0.15 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

, B = RQ~^AQR^ = 

0 

0 

-1 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

1 


0 

0 

0 

-1 
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The last nonzero power of the nilpotent matrix Ail + B is 


(Ail + B)2 = 


0 0 0.00125 0.0075 


0 0 
0 0 
0 0 


whose nonzero elements are all positive. 


C Proofs for the sufficient direction 

C.l Proof of Lemma 4. 

The intuitive behavior of the convolution of the pdf of a non-negative r.v. (Y) and the Erlang(Z,/i) pdf is 
the following: assume /y(0) > 0; for large values of /i, the Erlang pdf decays rapidly, so the function fy 
is very close to fx, except around 0, since convolution of a pdf fy with an Erlang(l,/x) pdf increases the 
multiplicity of 0 by 1. Lemma 4 utilizes this relation in the opposite direction. Hence if fx was positive 
everywhere except at 0 with multiplicity 1 , then fy will be positive at 0 and its positivity for R+ comes from 
the small difference from fx- (Actually, the tail and the main body of fy{x) will be examined separately 
for technical reasons.) 

fy can be calculated in the Laplace-transform domain as follows. The Laplace-transform of the 
Erlang(/, p,) pdf is 

h- 


fk.nis) = 


T 


Denote by fx{s) and /y(s) the Laplace-transform of fx and fy, respectively. Then from fx{x) = 
fvix) + fi,f,{x) we have /^(s) = /y(s) • , and so 


ms) = rxis) 


T 


— fxis) ( IH 


For 1 = 1, the inverse transform of fxis) -I- gives 


fvix) = fx{x) + - if'xix) + fx{0)) = fx{x) -f -f'x{x). 

p. fj. 


For I > 1, induction (or the binomial theorem) gives 

i 


= E (!) = -rE (!'l I - I 


The fact that fy{x) is a matrix-exponential pdf is straightforward from the above formula. Also, it has 
a representation of the form ME( 7 ', G) where 7 ' = (!) (^) ■ 

We fix a value 5 > 0 (independent from /i) such that 

f^\x)>0, xe(0,5]. 

This is possible since fx\^) > 0 and is continuous. This in turn implies by integration that 


fx\x)>0, a: G (0,(5]. 


for every i = I, I — 1,... ,1,0 and thus 


fvix) = 0 a: G (0,(5). 

i=o ^ 


19 






This holds for any value of fi. 

We examine the tail of fy next. Recall that as a; ^ cx), fx{x) decays as where 

^Ai,ni ^ 0- 


fvix) - fx{x) = Y, (■) - fx{x) = -'yY([ 




Since decays with rate 


-7( : -c,a;”^-"e 


ni — l—Xix 


for each i = 1 ,..., / for some constants Ci. 

Select Ki such that 

-7(')G*Ge°^l 


for z = 1,..., fc, Then 


< 2|cd Va: > Ki 


Ifvix) - fx{x)\ < Y 


Note that Ki is also independent from fx. 

The constant Yi=i is decreasing in Select fxo such that 


( 6 ) 


'^2|cd 1 

/ , ~ — > ^0- 
i=l ^ ^ 

Select K 2 such that 

fx{x) > Va; > K 2 . 

Set K = max(Ki, K 2 ). At this point, S and K are fixed (independently of ^), and for any ^ > fiQ it 
holds that 

fvix) > fx{x) - Ifvix) - fxix)\ > = 0 Vx > K. 

We now have positivity of fy at [0,5] and [K, 00 ]. For [i5, K], we use the formula (6) again, and note that 


sup 

xG[S,K] 




7QG*Ge*=^^l 


where sup,,g[, 5 _^] 
for any p, > ^ 1 , 


7(')G*Ge®“l 


is finite for each i = 1 ,..., while i 0, so there exists a fii such that 


\fYix)-fx (x) I < inf fx(x ), 
xe[s,K] 


which is positive due to the positive density condition (specifically that fx is strictly positive on a finite 
interval not containing 0). 

Selecting any /a > max(^i,/a 2 ) finishes the lemma. 
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C.2 Proof of Lemma 8. 

We will prove part i) first. 

We will begin by showing that the supremum is obtained for z = r. 
Series expansion gives 


A;=0 


where 


1 - n(n-l)...{n-k+l) jf < „ 


1 


if fc > n 


B{n,k) = 

Note the following properties of B{n, k): 

0 < B{n, k) < 1 y n, k; lira B{n,k) = 0 V k. 

n—^oo 

For every 2 with \z\ < r, we have 


e — 




00 u 




fc =0 


°° uife 






k =0 


k=Q 


Notice that the series expansion ensures e’' — (l + ^)” > 0, so we only need an upper bound on e’' — 
(l + Using the straightforward inequalities 


ln(l + a;) > a:- — (x > 0) and > 1 + x {x & . 


we get that 


(l + Ly = e'' - < e’' - = e’' (l - < g"- ^ 


2n 


We note that this estimate is asymptotically sharp as n —^ oo. 
For part ii)^ we use the series expansion again: 


- ( 1 + 


H 


yy -jjA: 

fc! 

,fc 


J2^Bin,k) 

fc=0 




k^O 




fc =0 


2n ’ 


where r = IIHI 


C.3 Proof of Lemma 9. 

According to the FE block composition of G it has the following block structure 


G = 


r Gn 

Gi2 

0 

_1 

G22 J 


(7) 


where 


— Ai 

Ai 0 


0 


0 

0 . 

. 0 ■ 

... 0 

— Ai Ai 


0 

, Gi2 = 

0 

0 . 

. 0 

1 - 

0 


0 

1 

1 


1 

0 . 

- 1 

0 


Gn = 


and G22 contains the rest of the FE blocks. The size of Gn is denoted by ni (which is the multiplicity of 
the dominant eigenvalue — Ai) and the size of G22 by 712- Let 


H = G + Ail, 
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and accordingly Hu = Gn + Ail, H 12 = G 12 and H 22 = G 22 + Ail, where I denotes the identity matrix of 
appropriate size. From H = G + Ail, it follows that 

gGa: _ g—AiXgHx 


and it is enough to investigate the dominant row of In the rest of the proof, (.)ii, (■)i 2 , (■)22 denote the 
corresponding matrix blocks (not single elements). The eigenvalues of H 22 have negative real parts. Their 
real parts are less than or equal to Ai — 5i(A2), where —A 2 is the eigenvalue with the second largest real part. 
From the series expansion of 


Hx 


^ n! 

n=0 


and the block triangular structure of H we have that the upper left block is 





where (Hux)” can be calculated explicitly: 


0 ... 0 (Aix)” 0 
0 ... 0 0 (Aix)" 


(Hiix)" = 


0 

0 


0 


0 

0 


(Aix)- 

0 


0 


with the nonzero elements being at positions (1, n + 1), (2, n + 2),.... Specifically, is 0 for n > ni, so 
the sum actually finite, and from the above form it is clear that (e^*)ii is upper diagonal, 

dominated by its first row, which of course also dominates (e^®) 2 i = 0. 

The rest of the proof is devoted to the elements of (e*^“)i 2 and (e*^“) 22 - For that, we need to examine 
(e“")i2.0 

n 

n\ 

n=0 

Here, 

n—1 

(H")i2 = ^(Hn)"Hi2(H22)"-"-' 

since H is an upper block bi-diagonal matrix. Thus 


00 n-l 


\n—k—l 


n.—k—l 


(e*^")i2 = ^^^(Hn)'=Hi2(H2 

n—1 ’ k—{) 

00 00 

= E(Hn)*Hi2 

A:—0 n—fc+1 

" 1-1 °° n 

==^(Hn)"'Hi2 ^ ^(H22)"-'=-'. 




n—k-\-l 


Again, the sum over k is finite. 

The inner sum can be calculated as 


1 


^ n\ 

n—k-\-l 


00 

1 y 1 

^ nl 

n—k-\-l 


X = X 


-k- 


■ '“-E 


1=0 
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and accordingly, 


(H22x)'= 

k\ 


7X / 

^ ^(H22)"-'=-' = (H22)-('=+') ( - I - H 22 X- 

n—/c+1 ^ 

Putting it all together, we obtain that 

(e*^")i2 = i] (Hn)"Hi2(H22)-("+') - I - H 22 X-• 

k^O ^ ^ 

The form of (Hii)^Hi 2 guarantees that for each k 

(Hn)"Hi2(H22)-("+'^ - I - H 22 X- 

has a single nonzero row, with k = ni — 1 corresponding to the first row being nonzero, fc = ni — 2 to the 
second etc. Within each row, the main term is 

-(Hn)"Hi2(H22)-("+')^^^^ = -^(Hn)"Hi2(H22)-'. 

k\ k\ 

Specifically, the main term in each element of the first row is of order and the order in the other 

rows is smaller within the block (e*^“)i 2 . 

We need to calculate calculated either via Cramer’s (which allows for calculating the 

constants Cj explicitly, but is left to the reader), or by using the following identity: 

nOO nOO 

Jr=0 Jr=0 

The integral exists because all eigenvalues of H 22 have negative real part, is a positive func¬ 
tion (“weight”) and contains the transition probabilities of a CTMC, so all elements of 

are positive for all t > 0. Thus all elements of are negative, and the single nonzero row of 

-(Hn)feHi 2 (H 22 )~(^+^) is strictly positive. 

Finally, since the block (H )22 has eigenvalues with negative real part, the elements of (e ^“’)22 decay 
exponentially, so they are of course dominated by the first row of (e^’’^)i 2 . 

One last remark: [5, page 771] discusses the same statement in a rather descriptive manner using com¬ 
municating classes. 
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